Example: An LSTM for Part-of-Speech Tagging

toy datasetでの検証

These will usually be more like 32 or 64 dimensional.

300 epoch

定式化

語彙集合V（単語wiを含む）

単語wiごとにタグのセットTからyiを付ける

wiのタグの推論結果がyi^

TODO：ここにソースコードへのリンク

LSTMTaggerを訓練する

アーキテクチャ：上の層から

Embedding layer

LSTM

The LSTM takes word embeddings as inputs, and outputs hidden states with dimensionality hidden_dim.

Linear layer

that maps from hidden state space to tag space

log_softmax

yi^ の数式の log Softmax の部分までモデルで出す

argmaxはモデルの推論結果の処理で使った

word_to_ix, tag_to_ix

str -> int

『ゼロから作るDeep Learning②』のWord2Vecの章で見た形式

訓練前後で品詞タグ付けの結果を確認

ここで、モデルの返り値をargmaxする

前：バラバラな品詞

後：正解を返す（訓練できている）

ロス関数 NLLLoss

The negative log likelihood loss. It is useful to train a classification problem with C classes.

モデルの訓練

Step 1. Remember that Pytorch accumulates gradients. We need to clear them out before each instance

各エポックの訓練では、文ごとに勾配を消去する（model.zero_grad()）

Step 2. Get our inputs ready for the network, that is, turn them into Tensors of word indices.

prepare_sequence関数を使って整数（word index）の並んだ配列に変換する

Step 3. Run our forward pass.

Step 4. Compute the loss, gradients, and update the parameters by calling optimizer.step()

lossのbackwardもここ